The Virtue of Reward: Performance, Reinforcement and Discovery in Case-Based Reasoning
نویسنده
چکیده
Agents commonly reason and act over extended periods of time. In some environments , for an agent to solve even a single problem requires many decisions and actions. Consider a robot or animat situated in a real or virtual world, acting to achieve some distant goal; or an agent that controls a sequential process such as a factory production line; or a conversational diagnostic system or rec-ommender system. Equally, over its life time, a long-lived agent will make many decisions and take many actions, even if each problem-solving episode requires just one decision and one action. In spam detection, for example, each incoming email requires a single classification decision before it moves to its designated folder; but continuous operation requires numerous decisions and actions. Reasoning and acting over time is challenging. A learner's experiences may prove unrepresentative of subsequent problems; a changing environment can render useless the system's knowledge. A system that tries to solve hard combi-natorial problems, for example, may find, through exploration in the space of solutions, that earlier training examples are suboptimal. Concept drift in spam detection is another example: spammers send new kinds of unwanted email or find new ways of disguising spam as ham. Agents must be highly adaptive if, over time, they are to attain and maintain high standards of, for example, accuracy, coverage and efficiency. To address these challenges in case-based agents, I have been drawing ideas from another field, that of classifier systems. Classifier systems, first proposed by John Holland, are rule-based systems. They comprise a performance component, a reinforcement component and a discovery component. The performance component chooses the agent's actions. The other two components enable classifier systems to exhibit two kinds of plasticity, parametric plasticity and structural plasticity. The reinforcement component uses feedback from the environment to update rule quality parameters. The discovery component uses genetic operators and other techniques to propose new rules, which may displace existing rules. I will describe my attempts to build a case-based counterpart to Stewart Wilson's XCS, which is one of the most popular, modern classifier systems. I will describe each of its three components. In discussing the reinforcement component , I will offer reflections on the relationship between Case-Based Reasoning and reinforcement learning. In discussing the discovery component, I will offer reflections on automatic case discovery and case base maintenance.
منابع مشابه
Improving Agent Performance for Multi-Resource Negotiation Using Learning Automata and Case-Based Reasoning
In electronic commerce markets, agents often should acquire multiple resources to fulfil a high-level task. In order to attain such resources they need to compete with each other. In multi-agent environments, in which competition is involved, negotiation would be an interaction between agents in order to reach an agreement on resource allocation and to be coordinated with each other. In recent ...
متن کاملKant’s Philosophy of Religion and the Challenges of Moral Commitment
Kant believes that the concepts of a just and compassionate God and the life beyond death spring from our rational need to unite happiness with virtue. But since Kant had banished happiness from any place in moral reasoning, his philosophy of religion have been deemed as not merely discontinuous with his ethics but radically opposed to it. This article tries to argue against this apparent incon...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملDynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)
In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005